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1 . Introduction 

It  is  generally  recognized  today  that  the  "software  problem," 
more  than  any  other  factor,  is  the  principal  impediment  to  a far 
wider  usage  of  computers  in  many  applications.  What  is  the 
software  problem?  Briefly,  it  is  the  high  cost  and  the  poor 
quality  of  much  of  the  software  produced  today.  Software  is  expen- 
sive to  construct  and  rarely  delivered  on  time;  as  the  cost  of 
computer  hardware  continues  to  fall,  software  costs  will  increasingly 
dominate  the  economics  of  computer  applications.  Software  is 
unreliable;  it  works  incorrectly  much  of  the  time  and  sometimes 
fails  completely.  Software  is  unresponsive  to  the  needs  of  the 
user,  and  does  not  adapt  well  to  changing  user  demands, 
since  it  is  difficult  to  understand  and  to  modify.  Conventional 
techniques,  methodologies,  and  tools  for  software  production  have 
shown  themselves  to  be  inadequate  for  the  demands  of  large  and 
complex  application  systems.  This  situation  is  well  appreciated  by 
government,  industry,  and  the  universities,  and  there  is  a great 
deal  of  current  research  being  conducted,  whose  goal  is  the  ameliora- 
ion  of  the  software  problem. 

Tri^£his  document /"■"'We  shall  endeavor  to^provide*  a broad  thematic 
overview  of  proposed  approaches  to  solving  the  software  problem 
We  will  survey  what  we  feel  to  be  the  most  important  research  being 
done  today  and  also  those  emerging  research  directions  that  will  be 
of  prime  significance  in  the  future.  Within  each  research  area,  we 
will  indicate  representative  projects  that  are  now  underway. 

In  a sense,  all  computer  science  research  may  be  viewed  as 
ultimately  contributing  to  the  solution  of  the  software  problem. 

Every  advance  in  the  technology  of  computer  usage,  or  deeper  under- 
standing of  the  nature  of  computation,  will  eventually  bear  fruit 
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in  terms  of  less  expensive  and  more  reliable  computer  systems. 

But  for  the  purposes  of  this  survey,  we  shall  concentrate  on  two 
areas  of  current  research  that  are  directly  oriented  to  the 
production  of  better  software.  For  convenience,  we  shall  identify 
these  two  areas  as  "software  engineering"  and  "automatic  programming. 
These  terms  have  been  misused  and  overused,  but  we  believe  they 
roughly  characterize  the  two  most  important  themes  in  contemporary 
software  research. 

Software  engineering  takes  as  its  goal  the  development  of 
principles,  guidelines,  methodologies,  and  standards  for  the  construe 
tion  of  computer  systems  (especially  large  and  complex  ones);  in 
short,  the  steps  needed  to  turn  programming  from  a haphazard 
and  mysterious  process  into  a disciplined  and  systematic  one. 

Research  in  this  area  can  be  described  as  developing  engineering 
techniques,  and  designing  software  development  tools  (such  as 
programming  languages)  that  embody  them.  Automatic  programming 
seeks  to  transfer  responsibility  for  programming  tasks  from  the 
human  to  the  computer:  in  this  way,  the  programmer's  task  is 

lessened,  and  the  decisions  that  he  must  make  are  reduced  in 
number  and  scope,  so  that  the  programming  process  becomes  both 
easier  and  less  error-prone.  Research  in  this  area  is  concerned 
with  studying  how  to  automate  increasingly  large  segments  of  the 
software  production  process. 

We  expect  that  both  software  engineering  and  automatic  pro- 
gramming (especially  the  particular  research  topics  surveyed  in 
this  paper)  will  have  a major  impact  on  the  software  problem. 


Indeed,  these  two  approaches  are  complementary,  although  they  are 
based  on  conflicting  assumptions  as  to  whether  people  or  programs 
are  primarily  responsible  for  producing  software.  In  the  near 
future,  almost  all  programming  will  continue  to  be  done  by  pro- 
grammers- only  a few,  well-understood  problem  domains  will  be  sup- 
ported by  automatic  programming  systems.  Consequently,  the  results 
of  software  engineering  research  can  and  will  have  major  impact 
in  the  near  term.  In  the  longer  term,  most  application  domains 
will  be  eventually  supported  by  automatic  programming  systems, 
which  will  lower  the  level  of  expertise  needed  for  computer  usage. 
However,  software  engineering  will  continue  to  be  important  for 
the  novel  applications  that  will  inevitably  arise.  In  addition, 
automatic  programming  systems  themselves  will  be  affected  by 
principles  of  software  engineering,  both  in  their  own  structure  and 
in  the  structure  of  the  programs  they  produce. 
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2 . Software  Engineering 
2 . 1 Introduction 

Software  engineering  encompasses  a number  of  general,  domain- 
independent  methods  of  coping  with  the  software  problem.  The  approach 
is  predicated  on  the  belief  that  programmers  will  continue  to  be 
responsible  for  the  construction  of  software  systems,  and  that 
tools  and  techniques  can  be  devised  that  will  enable  them  to  con- 
struct programs  of  better  quality:  more  reliable,  understandable,  modi- 
fiable and  maintainable.  The  methods  to  be  described  are  derived  from 
the  study  of  programming  methodology,  in  which  the  process  of 
programming  and  the  structure  of  programs  are  studied  in  order  to  iden- 
tify techniques  and  structures  conducive  to  better  quality  software. 
Study  of  the  process  of  programming  has  led  to  such  concepts  as 
stepwise  ref inement , ^ ^ ^ ^ and  in  addition  to  the  recognition  that 
removal  of  the  goto  statement  leads  to  program  structures  that 

are  simpler  and  more  understandable.  Study  of  the  structure  of 

(4) 

programs  has  led  to  the  concept  of  modularity  and  to  the  identi- 
fication of  the  multi-procedure  module  as  a particularly  useful 

(5  6 ) 

component  in  software  construction. 

Programming  methodology  is  a relatively  new  area  of  computer 
science,  but  it  has  already  had  a substantial  impact  on  the  way 
that  programmers  think  about  programming.  This  impact  can  be 
observed  in  the  current  furor  over  "structured  programming" 
and  in  the  proliferation  of  papers  describing,  for  example,  struc- 
tured FORTRAN.  Although  such  activity  is  beneficial  to  software 
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that  we  believe  will  contribute  to  our  understanding  of  the  pro- 
gramming process  and  lead  to  further  improvements  in  the  quality 
of  software. 

The  research  areas  to  be  discussed  can  be  related  by  considering 
how  an  ideal  method  of  program  construction  would  proceed.  The 
method  is  to  solve  problems  by  means  of  decomposition  based  on 
recognition  of  useful  abstractions.  Starting  with  a problem 

described  by  means  of  a specification,  the  following  four  steps  are 

- , (7) 

performed : 

1)  identify  a number  of  data  and  functional  abstractions 
that  are  useful  for  solving  the  problem; 

2)  give  a specification  describing  each  abstraction  '-ecisely; 

3)  write  a program  that  uses  the  abstractions  to  solve  the 
original  problem; 

4)  verify  that  the  program  really  solves  the  problem. 

After  the  four  steps  have  been  completed,  one  of  the  abstractions 
is  selected  as  a new  problem  and  the  process  is  performed  again. 

The  process  terminates  when  all  abstractions  not  supported  by 
programs  are  supported  by  the  programming  language  in  use.  Note 
that  although  the  process  described  above  is  top-down,  this  is  not 
essential.  Once  an  abstraction  has  been  specified,  it  is  possible 
to  consider  it  as  a new  problem  immediately,  and  return  to  steps 
3 and  4 of  the  original  problem  later. 

In  the  remainder  of  this  section,  we  will  discuss  research  aimed 
at  supporting  this  process.  We  will  discuss  research  in  pro- 
gramming languages,  specification  techniques  and  program  verifi- 
cation. We  will  also  discuss  tools  and  techniques  for  managing 
the  program  construction  process. 
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2 . 2 Programming  Languages 

The  most  essential  tool  in  program  construction  is  the  pro- 
gramming language:  the  language  influences  the  way  that  the  pro- 

grammer thinks  about  problem  solving,  and  in  addition,  a properly 
defined  language  can  simplify  the  transition  from  program  design 
to  implementation.  In  this  section  we  discuss  programming  language 
research  that  we  believe  will  enhance  program  quality.  Areas  to 
be  discussed  are:  data  abstraction,  modular  programming,  parallelism, 

control  abstractions,  exception  handling,  and  program  optimization. 

An  important  way  in  which  a programming  language  can  contribute 
to  the  program  construction  process  described  above  is  by  providing 
the  proper  abstraction  mechanisms,  so  that  the  abstractions  identi- 
fied during  design  can  be  realized  in  a program  in  a natural  way. 

Two  kinds  of  abstractions  were  mentioned  earlier:  functional  and 

data  abstractions.  Conventional  programming  languages  provide 
procedures  or  subroutines  for  realizing  functional  abstractions, 
but  have  no  analogous  mechanism  for  supporting  data  abstractions. 

A recent  significant  development  in  programming  language  research 

. (8) 

has  been  the  attention  focused  on  the  concept  of  a data  abstraction, 
and  on  linguistic  constructs  for  supporting  this  concept.  In  this  work, 
a data  abstraction  or  abstract  data  type  is  viewed  as  consisting  of 
a set  of  objects  or  values  together  with  a set  of  operations  to 
manipulate  the  objects:  all  uses  of  the  objects  are  accomplished 

through  invocations  of  the  operations.  The  benefit  gained  by 
restricting  manipulations  to  the  operations  is  that  implementation 
information  need  be  accessible  only  to  the  operations?  in  this  way, 
the  separation  of  use  from  implementation  that  is  provided  by  proce- 
dures for  functional  abstractions  is  also  possible  for  data. 
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In  order  to  fully  realize  the  benefits  obtainable  from  data 

abstractions,  new  linguistic  constructs  supporting  abstract  type 

definitions  have  been  developed.  These  mechanisms  permit  a type 

to  be  implemented  as  a unit  inciiding  a storage  representation 

for  objects  of  the  type  and  algorithms  for  each  of  the  type's 

operations.  The  constraint  that  objects  of  the  type  may  only 

be  manipulated  via  invocations  of  the  type's  operations  can  be 

(9) 

enforced  at  compile  time.  Classes  in  SIMULA  may  be  used  as 

type  definitions,  but  do  not  protect  the  storage  representation 

from  direct  manipulation  by  programs  outside  of  the  class.  New 

linguistic  mechanisms  supporting  abstract  type  definitions  are 

being  developed  at  MIT  (clusters  in  the  CLU  language^  and  at 

Carnegie-Mellon  (forms  in  the  Alphard  language  . ) ; similar 

(12) 

mechanisms  are  being  incorporated  into  many  other  languages. 

Many  practical  problems  still  remain  to  be  resolved,  for  example, 
efficient  storage  allocation  for  objects  of  abstract  type. 

The  four  step  process  described  earlier  is  inherently  modular: 
each  abstraction  (data  or  functional)  of  the  design  is  intended 
to  be  realized  by  a separate  program  module.  These  modules  must 
all  be  independent  of  one  another,  with  explicitly  defined  inter- 
faces consisting  of  the  input  and  return  parameters  of  the  module. 
The  resulting  program  is  not  block-structured  s instead  there  is  a 
set  of  program  modules  all  existing  on  the  same  level  in  the  sense 


that  they  are  all  equally  usable  in  writing  new  programs.  In  reality, 
however,  modules  are  not  intended  to  be  equally  usable;  for 

ovamnlo  cnmo  1 oc  n y*  e*  i nf  lioarl  V»xr  nnlxr  r\rna 
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module.  In  addition,  it  is  convenient  in  programming  to  take  advan- 
tage of  the  close  relationship  among  some  modules  and  permit  them 
to  share  data  implicitly.  It  is  unclear  how  modules  should  be 

grouped  into  programs  in  the  absence  of  block  structure;  work  on 

. , . ..  . . , . (13.14,15) 

modular  program  construction  is  reported  in. 

Most  work  in  programming  languages  has  concentrated  on  sequen- 
tial constructs,  while  many  interesting  and  difficult  programs 
must  take  advantage  of  parallelism  or  contend  with  it.  In  general, 
mechanisms  suggested  for  parallelism ^ ^ are  deficient  in  that 
not  enough  structure  is  provided;  the  programmer  is  given  little 

guidance  in  using  the  mechanisms  in  a safe  manner.  Hoare’s 

no) 

monitor  construct'  provides  a synchronization  mechanism  with 

much  more  structure:  the  programmer  is  encouraged  to  limit 

synchronization  interaction  to  the  set  of  operations  of  a type, 

operating  on  a particular  object  of  that  type.  Research  on  monitor- 

(19) 

like  constructs  is  underway  at  Carnegie  Mellon,  the  California 
Institute  of  Technology,  and  M.I.T.  In  addition,  better 

ways  of  spawning  and  destroying  processes  than  "parbegin"  and 
" parend"  must  be  developed. 

Control  abstractions  permit  the  programmer  to  abstract  away 

from  control  details.  One  type  of  control  abstraction  that  appears 

to  be  useful  is  the  iterator,  which  iterates  over  all  the  values  in 

a collection  in  some  specified  order.  The  user  of  the  iterator 

is  not  concerned  with  the  details  of  the  iteration;  instead  he 

makes  use  of  a generalized  loop  construct.  Iterators  are 
(22) 

present  in  SETL  for  sets:  new  mechanisms  permitting  their  use 

for  user-defined  types  are  being  defined  at  Carnegie-Mellon  , 
M.I.T.  and  Los  Alamos. 
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There  is  growing  recognition  that  a program  must  do  more  than 
produce  correct  output  given  well-behaved  input;  reasonable 
behavior  in  the  face  of  erroneous  input  or  machine  failures  (e.g., 
parity  errors)  is  also  important.  Exception  handling  mechanisms 

support  the  development  of  such  well-behaved  programs.  Work  in  this 

. , • (23,24) 

area  is  reported  m 

Research  in  program  optimization  techniques  to  provide  efficient 
implementation  of  domain-specific,  very  high-level  languages  is 

discussed  in  Section  3 of  this  report.  The  techniques 
described  there  are  important  for  general-purpose  languages  as  well. 
The  programmer's  job  is  to  produce  a well-structured  and  correct 
program,  but,  of  course,  the  program  must  also  execute  with 
acceptable  efficiency.  If  the  compiler  is  capable  of  transforming 
a wel 1- structured  program  into  an  efficient  program,  then  the 
programmer  need  not  perform  the  transformation  himself,  and  the 
complexity  of  his  task  is  reduced. 

2 . 3 Specification  Techniques 

A specification  of  a program  is  a description,  in  some  well- 
defined,  formal  language,  of  the  behavior  on  which  a user  of  the 
program  can  depend.  The  program  itself  is  a specification  under 
this  definition;  however,  a specification  in  a different  language 
is  desirable  for  several  reasons.  Hopefully,  a specification 
language  permits  a shorter  description  that  states  what  the 

(25) 

program  does  without  saying  how  the  task  is  accomplished  (see 

for  a discussion  of  desirable  properties  of  specification  techniques) . 

In  addition,  it  is  intended  in  the  methodology  described  above  that  a 
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specification  for  an  abstraction  be  given  in  advance  of  the  implemen-a 
tation;  the  specification  not  only  states  the  properties  to  be  relied 

on  by  the  user  of  the  abstraction,  but  also  the  properties  to  be 
provided  when  the  abstraction  is  implemented.  Finally,  the 
specification  is  an  independent  statement  of  the  meaning  of  the 
abstraction  that  can  be  used  for  verification  of  both  uses  and 
implementations  of  the  abstraction. 

Specifications  must  be  given  for  both  functional  and  data 
abstractions.  For  functional  abstractions,  input/output  specifica- 
tions that  describe  properties  of  the  program  state  are  given 
for  each  procedure;  the  input  specification  describes  properties 
of  the  state  holding  prior  to  procedure  execution,  while  the  output 
specification  describes  properties  holding  after  execution  has 

completed.  Functional  specifications  are  no  more  than  Floyd 

(26) 

inductive  assertions  attached  to  an  entire  procedure.  Hoare 

(27) 

has  studied  how  functional  specifications  are  used  in  proofs. 

Speci f ications  for  data  abstractions  are  not  as  well  understood 

(25) 

as  are  specifications  for  functional  abstractions  (see  for  a 

recent  survey  of  this  material) . Input/output  specifications  are 

not  necessarily  the  method  of  choice  here.  Instead  methods  that 

describe  the  abstraction  as  a whole  in  terms  of  interactions  among 

the  operations  are  preferable.  Three  techniques  are  currently 

(28)  , . . (29,30) 

under  study;  abstract  model,  state  machine,  and  algebraic. 

None  of  these  techniques  has  been  fully  defined  as  yet.  The  main 

work  remaining  to  be  done  is  to  define  precise  semantics  for  each 

method  and  to  develop  proof  techniques  that  make  use  of  them. 

(31)  (11) 

Tnese  questions  are  being  studied  at  SRI  , ISt,  Carnegie-Mellon 
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and  at  M.I.T.  In  addition,  the  utility  of  the  techniques  must  be 
studied  by  using  them  to  specify  complex  data  abstractions;  only  the 

state  machine  technique  has  been  applied  to  reasonably  large  problems 

, (32) 

so  far. 

In  addition  to  techniques  for  specifying  data  and  functional 

abstractions,  work  is  needed  in  specifying  the  behavior  of  parallel 

programs.  Here,  the  most  promising  approaches  associate  the 

specif ications  with  program  units  similar  to  modules  supporting 

(19  33  34) 

data  abstractions.  ' ' Parallel  program  specifications 

are  still  in  a rather  rudimentary  state.  Work  in  this  area  is 
likely  to  impact  the  choice  of  linguistic  constructs  to  support 
parallel  programming. 

There  are  two  difficult  problems  associated  with  specifications. 
First,  the  specifications  describe  functional  behavior,  but  many 
interesting  procedures  have  side  effects.  Non- functional  behavior 
is  usually  modelled  by  extra  assignments;  this  technique  works  as 
long  as  no  sharing  of  data  is  taking  place.  Unfortunately,  sharing 
of  data  is  useful  in  many  applications.  Some  work  in  this  area  is 

4-  ^ • <34> 

reported  in 

The  second  problem  with  specifications  is  the  sheer  complexity 
of  the  tasks  being  specified.  The  difficulty  is  such  that  no  real 
task  has  ever  been  fully  and  formally  specified.  Even  for  fairly 
small  programs,  it  is  necessary  to  invent  abstractions  (subsidiary 
definitions)  in  order  to  write  an  intelligible  specification. 
Inventing  the  appropriate  abstractions  for  specifying  a large  program 
clearly  requires  considerable  intellectual  effort. 
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It  is  not  known  at  present  how  well  the  techniques  will  scale 
up  to  large  programs.  The  techniques  are  appropriate  for  describing 
the  meaning  of  individual  program  modules.  To  the  extent  that  the 
meaning  of  an  entire  system  can  be  expressed  in  terms  of  the 
meaning  of  a collection  of  modules,  the  techniques  can  be  used  to 
specify  the  entire  system  as  well.  Thus,  the  techniques  may  be 
useful  for  describing  the  design  of  actual  systems.  It  is  unclear 
whether  the  techniques  are  also  useful  for  requirements  specifica- 
tions; research  is  needed  in  this  area. 

Certain  extensions  to  specification  techniques  are  needed 
if  the  techniques  are  to  be  applied  to  real  systems.  The  behavior 
described  by  the  specifications  must  be  defined  even  when  problems 
arise  (poorly  formed  input,  machine  malfunction);  most  specification 
techniques  are  deficient  in  this  respect.  Moreover,  in  addition 
to  specifying  behavior,  constraints  on  program  performance  must 
also  be  specified.  Very  little  has  been  done  in  this  area;  an 
understanding  of  how  the  performance  of  a module  is  related  to 
the  performance  of  the  modules  it  uses  is  needed. 

Since  specifications  are  to  be  written  in  formal  languages 
with  well-defined  semantics,  it  is  clearly  possible  to  write  programs 
that  "understand"  specifications.  There  are  several  uses  for  such 
programs.  A specification  is  intended  to  describe  a concept  which 
exists  in  someone’s  head,  and,  of  course,  it  is  possible  that  the 
specification  does  not  describe  the  concept  correctly.  Properties 
of  specifications  are  being  defined, (especially  for  specifications 
of  data  abstractions'  ' ) that  can  be  checked  automatically; 

if  the  properties  do  not  hold,  the  specification  is  probably 


incorrect.  In  addition,  a specification  can  be  made  to  run;  a 

formal  specification  is  a program,  albeit  a very  inefficient  one. 

By  running  the  specification,  the  system  can  give  the  programmer 

additional  information  about  what  the  specification  means.  A 

running  specification  can  be  used  to  simulate  the  behavior  of  a lower 

level  abstraction,  so  that  a module  using  an  abstraction  can 

be  run  in  advance  of  the  existence  of  the  implementation  of  the 

abstraction.  Finally,  it  is  even  possible  that  a specification  can 

be  automatically  translated  into  an  efficient  implementation,  although 

(35 ) 

the  translation  procedure  must  be  extremely  clever?  there  is  an 

obvious  connection  between  this  work  and  automatic  programming 
(discussed  in  Section  3) . 

2.4  Verification 

Verification  is  the  activity  of  demonstrating  that  a program 
performs  in  accordance  with  its  specification.  Note  that  verifi- 
cation does  not  establish  the  correctness  of  the  program;  it  merely 
establishes  the  consistency  of  the  program  with  the  specification. 

If  verification  fails,  either  the  specification  or  the  program  may 
be  in  error.  If  verification  succeeds,  our  confidence  that  the 
program  really  does  what  it  is  supposed  to  do  depends  on  our 
confidence  in  the  verification  process  itself,  and  on  our  confidence 
that  the  specification  really  describes  the  desired  behavior. 

2.4.1  Testing 

Traditionally,  the  method  of  establishing  that  a program  per- 
forms as  desired  is  to  test  the  program  by  running  it  on  a number 
of  test  cases.  Testing  is  often  divided  into  two  activities: 
module  testing  or  debugging,  and  system  integration  and  testing. 

In  module  testing,  single  modules,  or  small  groups  of  modules,  are 
tested  using  tests  that  take  into  account  the  internal  structure 
of  the  modules.  Then  the  independently  tested  modules  are  combined 
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and  the  combination  tested  to  see  how  they  perform  together;  at 
this  stage  the  internal  structure  of  the  individual  modules  is 
ignored  as  much  as  possible. 

This  traditional  method  is  in  ill-repute  nowadays;  not  only 
is  it  extremely  costly,  but  it  is  also  ineffective;  it  is  common- 
place that  many  errors  remain  in  programs  that  have  undergone  the 
testing  process.  Actually,  the  method  is  not  inherently  evil  in 
itself;  the  problem  is  that  it  is  performed  so  haphazardly.  First 
of  all,  rarely  are  there  specifications  stating  with  any  precision 
what  program  modules  are  supposed  to  do;  debugging  is  based  on  the 
programmer's  intuitive  understanding  of  the  required  behavior.  It 
is  no  wonder  that  severe  difficulties  arise  at  system  integration; 

the  likelihood  that  both  the  user  of  a module  and  its  implementer 
agree  in  their  understanding  of  what  the  module  does  is  quite  small. 

Dijkstra  has  stated  that  "program  testing  can  be  used  to  show 

( 36 ) 

the  presence  of  bugs,  but  never  to  show  their  absence."  In 

fact,  testing  could  be  used  to  show  the  absence  of  bugs  if  it  were 
accompanied  by  an  analysis  showing  that  all  meaningful  cases  had  been 
tested.  However,  such  an  analysis  may  be  fully  as  difficult  as  a 
proof  that  the  program  is  consistent  with  its  specification.  Note 
that  the  analysis  cannot  be  based  only  on  the  external  specification 
of  a module;  the  structure  of  the  module  must  also  be  taken  into 
account.  In  program  proving,  both  external  behavior  and  program 
structure  have  a natural  role  in  the  proof  process. 
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A certain  amount  of  work  has  been  done  on  automatic  system 
testing  in  which  the  system  is  run  on  large  numbers  of  test  cases 
and  the  testing  system  keeps  track  of  which  program  paths  are 
exercised  by  the  tests.  The  existence  of  unexercised  paths  at  the 
end  of  the  test  indicates  that  the  tests  are  inadequate  (or  that 
some  parts  of  the  program  cannot  be  reached) . if  all  paths  are 
exercised,  however,  this  does  not  mean  the  program  works  correctly 
(for  example,  a loop  may  fail  on  the  second  iteration) ? again,  an 
analysis  is  needed  to  establish  that  the  testing  is  adequate.  In 
the  absence  of  the  analysis,  the  tests  can  increase  confidence 
in  the  program,  so  this  kind  of  testing  can  be  valuable.  When 
the  test  cases  are  developed  by  a different  group  of  people  than 
the  system  implementers , the  cases  constitute  a second,  independent 
test  of  the  program.  Also,,  the  test  cases  (with  some  modification) 
can  be  run  after  a system  modification  to  detect  unwanted  changes 
in  system  behavior.  However,  when  program  proving  becomes  a practical 
method,  it  will  be  preferable  to  testing. 

2.4.2  Program  Proving 

Program  proving  has  recently  become  an  area  of  practical 
interest  because  of  advances  in  programming  methodology.  What  is 
particularly  important  is  the  modular  nature  of  well- structured 
programs.  Programs  produced  by  the  four-step  process  described 
earlier  will  be  composed  t f a hierarchy  of  modules,  one  for  each 
abstraction.  The  verification  cf  such  a program  can  proceed  in  a 
hierarchical  fashion.  Each  module  is  proved  independently  of  the 
others;  the  theorem  to  be  proved  is  that  the  module  meets  its 
specification.  In  doing  this  proof,  specifications  of  the  lower 
level  modules  are  used  as  axioms,  as  are  the  properties  of  con- 
structs in  the  programming  language.  Stating  the  theorems  and 

axioms  remains  relatively  simple,  since  the  specifications 
describe  abstractions. 
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It  is  clear  that  if  proofs  are  to  be  given  about  programs  of 
practical  size,  then  mechanical  aids  are  needed  to  cope  with  the 
large  amount  of  detail  involved.  By  focusing  on  well- structured 
programs,  machine-aided  proofs  of  realistic  programs  will  be  fea- 
sible in  the  near  future.  The  proofs  of  individual 
modules  are  of  a size  that  existing  automatic  program  provers  can 
manage,  and  an  entire  program  can  be  verified  by  proving  each  of  its 
modules.  In  unstructured  programs  without  a good  decomposition,  all 
parts  of  the  program  can  interact  with  each  other  in  complicated 
ways  - modular  decomposition  of  the  proof  is  not  possible,  and  the 
complexity  and  size  of  the  proof  is  unmanageable. 


Research  in  automatic  verification  systems  is  in  progress 

at  ISI(37!  SRI{28!  Stanford (39!  IBM(40),  and  Edinburgh (41 * ; 

(42) 

a survey  of  this  work  may  be  found  in  . Many  problems  remain 

to  be  solved  before  these  systems  can  be  used  to  prove  practical 
programs.  It  is  particularly  important  that  the  systems  be  extended 
to  support  hierarchical  program  proofs.  Some  work  has  been  done 
in  using  the  specifications  of  a functional  abstraction  in  proving 
a program  using  that  abstraction;  work  on  using  specifications  of 
data  abstractions  is  just  beginning.  Ultimately,  proofs  should  be 
carried  out  in  the  context  of  a data  base  containing  information 
about  previously  proved  theorems  and  lemmas,  so  that  hierarchical 
proofs  can  be  performed  as  easily  as  possible. 


2 . 5 Management  of  Programming 

There  are  very  serious  issues  involved  in  organizing  and  adminis- 
tering a group  of  skilled  but  individualistic  programmers  in  the 
production  of  a large  and  complex  software  system:  the  challenge  is 
to  ensure  that  their  individual  products  will  be  available  on  time, 
and  can  then  be  combined  into  a unified  whole.  There  are  a number 
of  directions  from  which  this  problem  is  being  attacked:  formulation 

of  organizational  structures  that  contribute  to  and  enforce  good 
programming  practice;  study  of  the  psychological  and  behavioral 
mechanisms  that  are  at  work  in  the  programming  process;  and  develop- 
ment of  automated  tools  to  aid  in  the  organization  and  management 
of  a software  production  effort. 

Over  the  years,  there  have  been  numerous  ad  hoc  managerial 
techniques  developed  in  the  data  processing  context.  But  it  was 
only  with  the  advent  of  structured  programming  and  software  engi- 
neering that  some  principles  were  adumbrated  for  the  organizational 
structure  of  a programming  team.  The  prime  mover  in  this  effort 
has  been  IBM,  with  their  development  and  espousal  of  "improved 
programming  technologies." 

There  is  a computer  folk  law  that  the  structure  of  a program 

reflects  the  structure  of  the  organization  that  produced  it.  This 

concept  is  reflected  in  the  idea  of  chief  programmer  teams,  an 

organizational  structure  that  encourages  the  use  of  top-down  design 

(43) 

and  structured  programming  techniques.  In  this  approach,  the 

chief  programmer  focuses  on  the  design  of  the  top  level  modules, 
delegating  to  his  subordinates  the  detailed  tasks  of  implementing 
lower  level  modules.  This  organization  has  demonstrated  a signi- 
ficant degree  of  success,  and  efforts  continue  on  its  refinement 
and  extension. 
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The  question  of  the  effect  of  human  factors  on  software  develop- 
ment has  only  recently  begun  to  be  seriously  studied.  We  shall 
consider  in  Section  3 relationships  between  programming  languages 
and  human  cognitive  processes.  But  there  have  also  been  studies 

of  the  motivational  factors  underlying  programming  performance  and 

. . (44) 

the  human  behavioral  aspects  of  the  programming  activity. 
Methodologies  and  concepts  are  only  now  being  formulated  in  this 
field,  and  its  full  application  is  still  somewhat  in  the  future. 

There  are  a number  of  groups  working  on  automated  or  semi- auto- 
mated aids  to  the  program  development  process.  For  example,  IBM's 
Program  Development  Libraries  are  computerized  records  of  the 
various  stages  of  system  completion.  One  of  the  most  ambitious  of 

these  efforts  is  being  conducted  by  the  ISDOS  group  at  the  University 
(45) 

of  Michigan  . Though  this  project  has  other  goals  as  well,  it 
can  be  viewed  as  a system  that  maintains  an  on-line  data  base 
about  each  of  the  various  modules  of  the  system  under  construction: 
its  function,  who  is  responsible  for  it,  its  sources  of  input,  the 
destinations  of  its  outputs,  and  the  like;  in  addition,  ISDOS"  provides 
a unified  data  dictionary  and  directory  for  the  entire  system. 

Use  of  such  a system  provides  a manager  with  an  effective  tool  for 
coping  with  the  complexity  of  a large-scale  project. 

We  can  expect  to  see  more  systems  along  the  lines  of  ISDOS 
in  the  future.  A worthwhile  project  would  be  to  develop  a system  in 
which  the  programming  language,  the  specification  language,  and  a 
verification  system  are  made  available  to  the  programmer  in  a uni- 
fied manner.  Such  a system  would  be  very  useful  in  following  the 
four-step  program  development  process  described  earlier. 
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3 . Automatic  Programming 
3 . 1 Introduction 

Automatic  programming  research  seeks  to  lessen  the  cost  and 
complexity  of  software  construction  by  limiting  the  decisions  that 
need  to  be  made  by  people  during  the  programming  process.  The  goal 
of  this  work  is  to  reduce  the  number  of  decisions  that  a programmer 
must  make,  and  confine  even  these  to  issues  that  he  presumably 
understands  well.  The  intent  is  to  allow  the  programmer  to  concen- 
trate (in  greater  or  lesser  degreee)  on  his  problem  domain  and  on 
problem  solving  in  its  terms,  rather  than  on  computer-like  data  and 
algorithmic  structures.  The  level  of  man-machine  interaction  is 
thus  moved  away  from  the  machine  and  closer  to  the  person,  and  the 
responsibility  for  making  decisions  on  matters  that  relate  only 
to  the  computer  are  delegated  to  the  computer  itself.  If  the 
programmer's  task  is  made  easier,  he  should  be  able  to  perform  it 
more  quickly  and  reliably.  In  addition,  it  is  possible  that  the 
computer,  which  is  adapted  to  dealing  with  large-scale  complexity, 
will  be  able  to  make  many  programming  decisions  better  than  people 
customarily  do. 

The  assumption  of  programming  responsibility  by  a computer  is 
made  possible  by  restricting  the  scope  of  the  programming  effort 
to  an  area  that  is  well-understood  and  consequently  subject  to 
automation.  The  premise  of  automatic  programming  is  that  exten- 
sive programming  practice  in  a particular  area  develops  human 
understanding  of  relevant  programming  techniques  in  that  domain 
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to  the  point  where  they  can  be  automated.  Conventional  programming 
language  compilers  were  one  of  the  first  applications  of  this  idea, 
building  on  much  accumulated  expertise  in  assembly  language  pro- 
gramming: later,  compiler-compilers  were  able  to  automate  the  compiler 
building  process,  once  that  task  was  well  understood.  In  this 
section,  we  shall  survey  three  research  areas  that  are  based 
on  the  automation  of  human  decision-making:  nonprocedural  languages, 

data  management  systems,  and  knowledge-based  systems. 

3 . 2 Nonprocedural  Languages 

One  proposed  way  of  removing  decision-making  from  the  shoulders 
of  the  programmer  is  by  having  him  use  so-called  very  high-level (VHL) 
or  nonprocedural  programming  languages. ^ The  precise  meaning  of 
this  term  has  been  the  subject  of  much  debate  and  remains  somewhat 
imprecise.  But  the  basic  idea  is  clear:  a VHL  language  enables 

the  programmer  to  express  his  problem  in  terms  that  are  meaningful 
to  him  and  relevant  to  his  application  domain;  it  requires  less 
specification  of  computer-related  details  than  does  a conventional 
programming  language  . Thus,  the  programmer  is  free  to  concentrate 
on  an  abstract  view  of  the  problem  and  his  algorithmic  solution  to 
it,  while  ignoring  irrelevant  details.  This,  of  course,  is  a rela- 
tive definition;  and,  in  fact,  FORTRAN  may  be  regarded  as  non- 
procedural when  compared  with  assembly  language.  Current  research, 
however,  is  concerned  with  languages  that  are  nonprocedural  with 
respect  to  FORTRAN  or  PL/I . The  belief  is  that  using  such  languages 
will  be  significantly  easier  than  programming  in  conventional 

languages,  and  so  will  result  in  less  expensive  and  more  reliable 

(2) 

software.  A secondary  effect  should  be  the  lowering  of  the 

(3) 

expertise  level  needed  to  effectively  utilize  computers. 
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The  procedural  nature  of  conventional  programming  languages 
stems  from  the  resemblance  of  their  data  structuring  facilities 
and  underlying  computational  models  to  the  capabilities  supported 
by  conventional  von  Neumann  architecture  machines.  Nonprocedural 
languages  embody  various  features  that  reject  this  model.  Some 
of  the  more  important  of  these  are:  aggregate  data  structures, 

such  as  sets  and  relations,  together  with  facilities  to  readily 
manipulate  them:  associative  referencing,  with  retrieval  of  data 
afforded  by  content  rather  than  context  (location):  pattern-matching 
facilities;  very  powerful,  data-oriented  control  structures,  ^ 

such  as  implicit  loops,  backtracking,  and  data  flow  specifications. 
Hie  underlying  argument  is  that  features  such  as  these  are  more 
natural  for  the  expression  of  algorithms  than  those  provided  by 
conventional  languages.  We  observe  that  in  all  these  cases  the 
representation  of  the  high-level  concepts  in  terms  of  conventional 
machine  structures  is  not  immediate,  and  requires  decision-making 
in  order  to  choose  and  specify  an  implementation.  The  conversion 
from  abstract  computational  structures  to  those  that  resemble 
machine  features,  normally  done  by  the  programmer,  is  here  per- 
formed by  the  language  compiler. 


A 


-22- 


Research  in  nonprocedural  languages  has  two  goals:  to  design 

languages  that  are  truly  easy  to  use;  and  to  construct  implementa- 
tions of  these  languages  that  are  acceptably  efficient,  that  are 
able  to  specify  the  detailed  control  abdicated  by  the  programmer. 

In  both  of  these  areas,  the  tradeoff  between  generality  on  the  one 
hand,  and  efficiency  and  simplicity  on  the  other,  looms  very  large. 

A language  that  is  applicable  in  many  contexts  is  unlikely  to  be 
finely  tuned  for  use  or  performance  in  any  of  them.  Consequently, 
the  non-procedural  languages  that  have  been  and  are  being  pro- 
posed are  usually  of  prime  applicability  to  just  one  problem  area. 
Such  a language  possesses  just  the  right  set  of  data  structures, 
operators,  and  control  structures  appropriate  for  the  natural 
expression  of  algorithms  in  the  domain  for  which  it  was  designed; 
in  addition,  the  compiler  for  the  language  can  "know"  about  good 
implementation  techniques  for  this  problem  area.  The  point  is  that 
if  we  are  to  automate  decision-making  in  the  realm  of  programming, 
it  must  be  for  decisions  that  we  thoroughly  understand;  and  at 
this  time,  our  knowledge  and  experience  of  programming  style  and 
implementation  techniques  is  sufficiently  highly  developed  only 
within  the  prescribed  limits  of  various  individual  problem  domains. 

Most  nonprocedural  languages  to  date  have  been  intended  for  one 
of  two  problem  ^reas:  business  data  processing  or  artificial 

intelligence.  ^ Examples  of  such  languages  are  SSL,  ^ BDL,  ^ 
PLANNER,  ^ SAIL,  ^ and  QA4 . one  of  the  earliest  nonpro- 
cedural languages  was  SETL,  ostensibly  a general-purpose 

language,  but  actually  best-suited  for  applications  of  combinatorial 

mathematics.  There  are  more  recent  efforts  to  design  languages 

(12) 

for  other  areas,  such  as  mechanical  assembly.  As  expertise 

is  developed  in  desirable  programming  styles  for  other  application 
domains,  new  languages  for  these  areas,  which  incorporate  this 

knowledge,  will  be  designed. 
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3 • 3 Psychology  of  Programming 

To  date,  most  language  design  efforts  (including  those  for 
non-procedural  languages)  have  been  based  on  the  intuition  and 
experience  of  the  design  team,  with  respect  to  what  makes  programs 
easy  or  hard  to  write  and  read.  A new  research  movement  in  the 
psychology  of  computer  programming  is  attempting  to  put  language 
design  on  a more  scientific  basis.  The  goals  of  this  work  are  to 
identify  the  inherent  computational  models  that  people  naturally 
possess  and  thereby  determine  how  various  linguistic  features  con- 
tribute to  the  difficulty  of  programming.  Computer  scientists 
and  psychologists  are  working  towards  developing  experimental 
methodologies  for  the  design  and  evaluation  of  programming  lang- 
uages. (33,14)  A number  of  results  have  already  come  out  of  this 
work:  for  example,  the  data  base  access  language  SEQUEL  has  gone 
through  several  redesigns,  principally  on  the  basis  of  the  results 
of  its  psychological  evaluation.  35,16,17)  rj^e  center  for  this 
type  of  work  is  currently  IBM  Research  (Yorktown) , with  efforts 
also  underway  at  Toronto,  USC,  and  Indiana.  Though  still  at  a 
very  early  stage  of  development,  this  kind  of  research  will  undoubtedly 
grow  as  nonprocedural  languages  are  required  and  designed  for 
more  fields.  One  of  the  long-range  goals  of  this  work  is  to  reduce 
the  amount  of  training  and  expertise  needed  to  write  programs, 
and  thereby  make  computers  more  accessible  to  end  users  (as  dis- 
tinct from  specialists  in  programming) . 
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3 • 4 Program  Optimization 

Research  in  program  optimization  belongs  in  a survey  of  work 
that  is  attempting  to  solve  the  software  problem  because  the 
successful  design  and  use  of  easy-to-use  systems  (such  as  non- 
procedural languages)  is  entirely  predicated  on  the  development 
of  advanced  optimization  technologies  that  can  turn  high-level 
specifications  into  efficient  machine  code.  A bad  implementation 
strategy  for  a non-procedural  language  can  lead  to  a possibly 
exponential  degradation  of  performance  when  compared  with  conven- 
tional coding;  some  loss  of  run-time  efficiency  can  be  tolerated 
because  of  gains  in  other  parts  of  the  system  life-cycle,  but  it 
should  be  kept  within  a (small)  linear  factor.  Therefore,  we 
shall  need  optimizers  that  can  analyze  a nonprocedural  program, 
glean  high-level  information  from  it,  and  then  synthesize  a near- 
optimal  implementation  for  it. 

There  is  a long  history  of  research  in  code  optimization  for 

conventional  programming  languages,  and  some  contemporary  research 

is  an  outgrowth  of  these  earlier  efforts.  The  work  of  Allen's 

group  at  IBM  Research  has  largely  been  done  in  the  context  of  PL/I, 

but  has  been  instrumental  in  the  development  of  innovative  analytic 

methods  for  extracting  information  from  program  text.  They  are  now 

looking  at  novel  structures  for  optimizers,  especially  for  applica- 

(18) 

tion  to  the  non-procedural  environment.  Similarly,  Cheatham's 

group  at  Harvard  has  been  focusing  on  the  optimization  of  ECL,  a 
LISP-like  language. 


k 
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The  largest  amount  of  work  on  nonprocedural  optimization  has 
been  done  on  APL.  The  popularity  of  this  language,  and  the  poten- 
tial inefficiency  of  its  implementation,  have  motivated  several 
groups  to  attempt  its  compilation.  Much  of  this  research  has  con- 
centrated on  using  algebraic  properties  of  the  APL  operators  to 


effect  transformations  on  the  program  text. 


More  recent 


efforts  focus  on  global  restructuring  of  the  specified  algorithm. 

Perlis'  group  at  Yale,  and  the  ALTO  group  at  Xerox  PARC,  have 

considered  co-routine  implementations  for  arrays  and  implicit 

(20, 21) 

loops,  as  well  as  other  large  scale  transformations. 

The  SETL  group  at  NYU  has  for  several  years  been  studying  the 
problem  of  achieving  an  efficient  implementation  of  that  set- 
oriented  language.  They  have  been  particularly  concerned  with 
determining  facts  about  the  data  structures  of  a program  in  order 


to  choose  good  representations  for  them. 


Low  and  Rovner  at 


Rochester  have  been  working  on  a similar  problem  in  the  context  of 

(23) 

the  SAIL  language. 

The  Protosystem  I group  at  M.I.T.  is  developing  an  efficient 

(24) 

implementation  of  the  data  processing  language  SSL.  Their 

optimizer  is  of  interest  for  two  reasons.  It  makes  use  of  various 
heuristic  techniques  for  performing  global  optimizations  (involving 
such  issues  as  file  design  and  aggregation  of  computations) , whose 
complete  analysis  would  entail  unacceptable  combinatorial  explosion. 
And  it  utilizes  novel  sources  of  information,  other  than  the  static 
program  text,  in  making  decisions;  specifically,  it  refers  to  a data 
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base  of  environmental  information  provided  by  the  user,  and  may 
explicitly  query  the  user  for  data  not  provided  therein.  'These 
two  themes  will  be  of  increasing  importance  in  the  future:  global 

optimization  using  heuristics:  and  the  gathering  and  use  of  dynamic 
information  (such  as  typical  values  of  variables,  frequency  of  execu- 
tion of  various  program  segments,  and  the  like).  One  important 
source  for  information  will  be  previous  execution  histories  of 
the  program  in  question.  A program  may  well  be  recompiled  several 
times,  on  each  occasion  using  information  gathered  from  earlier 
execution  histories. 

3 . 5 Data  Management  Systems 

One  application  area  where  a great  deal  of  current  research  is 
directed  towards  the  automation  of  human  decision-making  is  in  the 
field  of  data  base  management.  There  is  a large  class  of  computer 
applications  that  are  characterized  by  a large,  centralized 
collection  of  data  that  is  used  by  many  different  application  pro- 
grams. Decisions  must  be  made  in  designing  these  data  bases, 
and  programs  written  to  access,  maintain,  and  update  them. 
Opportunities  for  error  are  present  in  all  these  cases  because 
of  the  difficulty  of  performing  these  tasks. 

In  a conventional  data  management  system,  a great  amount  of 
responsibility  lies  on  the  shoulders  of  the  data  base  administrator 
and  of  the  application  programmer  who  uses  the  data  base.  The  DBA 
must  design  the  data  base,  choosing  a logical  organization  for  the 
information  of  the  enterprise?  this  logical  organization  must  then 
be  expressed  in  terms  of  the  particular  data  structuring  facilities 
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that  the  data  base  system  provides  for  modelling  purposes.  The 
data  base  user  in  turn  must  be  familiar  with  this  structure  and 
write  his  programs  to  "navigate"  through  it,  locate  the  data  items 
of  interest  to  him,  and  perform  the  appropriate  computations  on 
them.  In  addition,  someone  must  write  programs  to  examine  all  new 
data  for  the  data  base  to  determine  correctness  and  validity. 
Conventional  systems  only  assume  from  the  programmer  the  responsi- 
bility for  knowing,  using,  and  maintaining  the  physical  represen- 
tations of  the  data  structures  that  comprise  the  data  base. 

Though  this  approach  does  represent  a major  improvement  over  earlier 
file  systems,  it  still  leaves  the  responsibility  for  a great  deal 
of  low  level  decision-making  in  the  hands  of  people.  Current 
research  seeks  to  automate  many  of  these  tasks,  and  lead  to  more 
reliable  and  easier  to  use  data  base  systems. 

Probably  the  most  unified  approach  to  these  problems  is  being 

(25) 

conducted  in  the  context  of  relational  data  bases.  The  rela- 

tional model  of  data  proposes  a single,  very  simple  and  uniform 
data  structure  with  which  to  model  the  user's  logical  view  of  his 
data  base.  This  logical  view  is  entirely  divorced  from  any  aspects 
of  the  data  base's  physical  organization;  the  responsibility  for 
choosing  representational  mechanisms  and  access  structures  for  the 
data  is  transferred  from  the  DBA  to  the  system.  Similarly,  the 
application  programmer  is  presented  with  a much  simpler  view  of  the 
data,  since  he  transacts  with  the  logical  version  of  the  data  and 
not  its  physical  representation. 


Virtually  all  of  the  information  in  a relational  data  base  is 
carried  in  data  values  rather  than  in  data  structures.  Such  a 
data  base  lends  itself  to  access  by  means  of  nonprocedural  techniques 
especially  associative  referencing.  There  have  been  a number  of 
nonprocedural  languages  proposed  for  accessing  relational  data 
bases.  ' ; These  languages  are  intended  both  to  provide  easier 

access  to  data  bases  for  programmers  writing  complex  programs  in 
some  conventional  programming  language,  and  to  enable  direct  access 
to  data  by  non-programmers.  Research  is  being  conducted  on  making 
such  languages  so  easy  to  learn  and  use  that  in  many  cases  the 
need  for  writing  "programs"  would  be  obviated. 

Thus,  the  relational  model  of  data  is  a "higher- level"  data 
model,  providing  a less  machine- like  view  of  data  than  those  afforded 
by  conventional  systems,  and,  therefore,  easing  the  tasks  of  both 
the  DBA  and  the  user.  The  superiority  of  this  approach  for  many 
applications  is  becoming  widely  accepted.  But  again,  the  critical 
question  is  how  the  kinds  of  decisions  normally  made  by  people  can  be 
competitively  made  by  the  data  base  system.  The  issues  in  this 
context  are  the  choice  of  physical  representations  for  the  (logical) 
relations,  the  selection  of  auxiliary  structures  to  provide  better 
access  to  these  storage  structures,  and  the  implementation  of  the 
non-procedural  queries  put  against  the  data  base.  In  a conventional 
environment,  these  decisions  are  all  made  by  people.  There  are 
several  research  groups  working  on  the  technology  of  relational 
data  base  implementation,  attempting  to  put  the  needed  "intelli- 
gence" into  a data  management  system.  The  largest  of  these  efforts 
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is  the  System  R project  at  IBM  Research  in  San  Jose.  Although 

they  are  quick  to  disclaim  any  official  product  implications 

for  their  work,  this  group  is  making  a major  attempt  to  develop  a 

relational  system  that  can  be  competitive  with  conventional 

systems  on  large-scale  data  bases.  Other  efforts  in  this  area 

include  the  INGRES  system  at  Berkeley,  PRTV  at  IBM  United  Kingdom, 

(29,30,  31) 

and  ZETA  at  the  University  of  Toronto. 

The  approach  taken  by  the  above-mentioned  groups  is  that  of 
dynamic  optimization  of  non-procedural  accesses  with  respect  to 
static  representational  structures.  That  is,  a representational 
scheme  is  chosen  for  a relation  at  the  time  of  its  definition; 
the  system  exercises  "intelligence"  in  using  this  structure  in  the 
most  effective  way  to  respond  to  queries.  Future  research  will  also 
include  intelligence  in  the  adaptive  selection  of  physical  repre- 
sentations based  on  the  pattern  of  use  of  the  data;  some  work  is 
being  done  on  this  now  at  M.I.T.  and  SRI.  ' There  is  also 

preliminary  interest  in  the  design  of  novel  computer  architectures, 
not  based  on  the  von  Neumann  model,  to  provide  hardware  support 
for  relational  data  bases;  the  major  concept  here  is  to  enable 
parallel  associative  access  of  the  data  base,  by  means  of  distri- 
buted logic,  placing  a microprocessor  on  each  head  of  a disk. 
Prototype  systems  are  being  developed  at  Toronto,  Utah  and  Florida. ^ 
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3 . 6 Data  Semantic s 

A new  wave  of  research  in  data  management,  currently  getting 
under  way,  seeks  to  further  automate  various  design  decisions 
associated  with  the  data  base,  and  make  it  even  easier  for  non- 
experts to  interact  with  it.  The  common  theme  running  through  this 
work  is  the  concept  of  data  semantics;  that  is,  the  data  manage- 
ment system  is  to  possess  an  "understanding"  and  "knowledge"  of 
the  meaning  of  a data  base,  as  well  as  of  its  structure,  and 

(35) 

utilize  this  information  to  perform  tasks  usually  done  by  people. 

The  most  widely  discussed  application  of  data  semantics  is  the 
provision  of  various  aids  to  the  DBA  to  assist  him  in  coping  with 
the  complexity  of  a large-scale  data  base,  with  respect  to  its  design, 
maintenance,  use,  and  reorganization.  (This  idea  motivated  the  pro- 
posal for  a conceptual  schema  level  of  description  for  data  bases, 

v (36) 

by  the  ANSI/SPARC  Committee  on  Data  Base  Management  Systems.) 

The  simplest  such  application  would  be  a "semantic  data  dictionary," 
which  would  provide  the  DBA  with  a map  not  only  of  the  data  base's 
structure,  but  of  the  meaningful  relationship  among  its  components. 

The  DBA  would  use  this  map  in  organizing  (and  reorganizing)  the 
data  base.  Such  a dictionary  would  also  be  useful  in  resolving  user 
queries,  and  could  eventually  enable  user  interface  languages  that 
would  not  demand  that  the  user  know  even  the  logical  structure  of 
his  data  base,  but  rather  allow  him  to  express  his  queries  in  terms 
semantically  meaningful  to  him.  Such  a language  could,  in  turn, 
provide  a needed  intermediate  level  of  support  for  natural  language 
access  to  a data  base. 
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A more  ambitious  application  of  data  semantics  is  an  automatic 
data  base  designer,  which  translates  specifications  of  the  semantic 
characteristics  of  the  application  into  recommended  logical  (and 
physical)  data  structures  for  the  data  base  representation.  Simi- 
larly, such  semantic  information  could  be  used  for  intelligent 
optimization  and  access  path  selection  for  user  queries.  Another 
use  of  data  semantics  would  be  in  semantic  integrity  checking, 
using  knowledge  of  the  data  base's  meaning  to  detect  and  correct 
erroneous  data  being  submitted  to  it. 

Research  is  focusing  on  developing  the  representational  mechan- 
isms needed  to  express  the  semantics  of  data,  and  designing  the 
systems  that  apply  semantics  to  the  above  problems.  One  approach 
is  that  cf  higher-level  data  models,  ways  of  describing  data  not 
in  terms  of  representational  data  structures  (even  simple  ones  like 
relations),  but  using  conceptual  primitives  such  as  entity,  attribute, 

and  relationship.  The  DIAM  model  of  Senko  at  IBM  Research  is  an 

(37) 

example  of  such  work.  A similar  effort,  which  focuses  on  the 

semantics  of  relational  data  bases,  is  being  conducted  by  Codd  at 
IBM,  while  a group  at  Toronto  is  using  semantic  networks  for  repre- 

/ 30  \ 

senting  the  meaning  of  a data  base.  Semantic  descriptions  of 

data  bases  are  being  used  for  automatic  data  base  design  at  the 

University  of  Michigan;  IBM  has  announced  a product,  based  on 

similar  principles  but  much  more  primitive,  for  generating  an 

(39  40) 

IMS  data  base  from  a semantic  description.  * 

Constraints  (logical  predicates  that  define  legal  states  of 
a data  base  or  legitimate  changes  to  it)  are  being  used  to  express 
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3 . 7 Knowledge  Based  Systems 

The  research  described  above  seeks  to  lessen  the  duties  of  the 
programmer  by  automating  many  of  the  decisions  that  he  would  other- 
wise have  to  make.  The  'knowledge"  that  these  systems  use  in 
making  decisions  is  derived  from  extensive  human  experience  in  the 
field  in  question;  when  people  really  understand  how  they  make 
decisions,  they  are  able  to  program  computers  to  make  those  deci- 
sions for  them.  However,  the  approaches  we  have  discussed  so  far 
are  intrinsically  viable  only  for  limited  domains;  that  is,  "pro- 
gramming expertise"  for  a particular  area  is  deeply  but  implicitly 
embedded  in  a programming  system  for  that  area,  with  the  result 
that  its  generality  is  strictly  limited.  Furthermore,  those  aspects 
of  programming  that  the  foregoing  work  seeks  to  automate  are 
by  and  large  restricted  to  "coding";  the  production  of  algorithmic 
programming  language  code  from  a precise  (though  less  procedural) 
specification  of  the  algorithm  to  be  implemented. 

There  is  another  theme  in  automatic  programming  research, 

the  so-called  "high-road",  that  seeks  to  understand  and  automate 

the  entire  human  intellectual  endeavor  called  programming,  beginning 

with  the  early  stages  of  problem  acquisition  and  strategy  formu- 
(43) 

lation.  The  goal  is  to  have  the  computer  assume  a dominant  share 

of  responsibility  in  the  production  of  software.  This  work  is 
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closely  allied  to  research  in  artificial  intelligence  in  its 
quest  for  an  understanding  and  simulation  of  human  cognitive  pro- 
cesses and  problem  solving.  One  distinguishing  aspect  of  most  of 
this  research  is  that  it  utilizes  an  explicit  "knowledge  base,"  which 
contains  information  about  some  application  domain  or  about  pro- 
gramming in  general,  and  uses  this  base  to  generate,  understand, 
or  verify  programs.  In  addition,  the  scope  and  range  of  these 
efforts  are  considerably  more  general  than  those  that  we  have  dis- 
cussed above.  In  these  "knowledge  based"  systems,  universal  pro- 
gramming mechanisms  applicable  to  any  problem  domain  are  utilized, 
resulting  in  a high  degree  of  generality.  Such  an  understanding 
of  the  programming  process  can  be  used  in  a variety  of  advanced 
applications . 

One  of  the  most  ambitious  applications  of  knowledge  based  systems 
is  the  generation  of  a program  from  a high  level,  declarative 
specification  of  its  intent,  which  is  devoid  of  any  algorithmic 
information  whatever.  in  order  to  accomplish  this,  a system  must 
possess  a semantic  knowledge  base,  expressing  information  about  the 
application  domain  in  question.  The  system  must  also  have  an  under- 
standing of  the  programming  constructs  available  to  it,  and  their 
utility  in  expressing  algorithms  for  the  domain.  The  system 
first  acquires  the  specifications  of  the  desired  program,  relates 
them  to  its  model  of  the  domain,  and  produces  a strategy  for  the 
program  structure;  it  then  uses  its  programming  knowledge  to  synthe- 
size this  program  using  available  constructs.  This  approach  is  much 
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more  general  than  the  work  on  nonprocedural  languages  because  of 
the  enormous  distance  of  the  program  specifications  from  machine- 
like algorithmic  structures,  as  well  as  the  applicability  of  these 
results  to  many  different  problem  areas.  In  general,  this  research 
is  more  speculative,  with  a longer  time-scale  and  a potentially 
greater  payoff,  than  what  we  have  discussed  above.  Its  success 
will  entail  not  only  the  automation  of  computer-related  aspects 
of  programming,  but  also  of  the  creative  aspects  of  heuristic 
problem-solving  that  precede  algorithm  formulation. 

There  are  many  subproblems  that  must  be  addressed  in  such 
research.  Principal  among  these  is  the  question  of  the  represen- 
tation of  knowledge,  how  the  information  about  the  application 
domain  is  to  be  expressed  for  use  by  the  system.  Another  major 
issue  is  the  isolation  of  the  conceptual  fundamentals  of  pro- 
gramming, breaking  them  down  into  a collection  of  primitives  that 
can  be  used  in  a synthesis  procedure,  and  then  devising  a system 
to  build  programs  using  these  abstract  primitives.  Finally,  of 
course,  there  is  the  problem  of  determining  from  the  functional 
specifications  provided  by  the  user,  and  the  base  of  domain  depen- 
dent knowledge,  a problem-solving  approach  to  be  synthesized  with 
the  available  language  constructs.  Most  current  researchers  address 
this  last  issue  by  concerning  themselves  with  one  domain  at  a time, 
which  has  a particular  problem  solving  methodology  that  the  system 
can  use. 
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There  are  various  possibilities  for  the  medium  of  expression 
for  program  functional  specifications.  The  ultimate  choice  would 
be  natural  language;  and  since  many  researchers  feel  that  the 
knowledge  needed  for  program  generation  is  closely  related  to 
that  needed  for  natural  language  understanding,  the  two  efforts 
are  often  closely  linked.  Interim  approaches  involve  some  high- 
level  but  formal  descriptive  language,  or  a list  of  input-output  pairs 
representative  of  the  transformation  the  program  is  to  effect. 

In  any  event,  a dialogue  between  the  user  and  the  system  is  envi- 
sioned as  being  necessary,  whereby  the  system  completes  the  user's 
initially  specified  model  by  means  of  a series  of  questions  and 
answers.  Specific  applications  on  which  current  research  efforts 

are  focusing  include  list  processing  programs,  message  distribution 

(44( 45) 

systems,  and  inventory  control  systems.  ' It  should  be 

re-emphasized  that  these  are  merely  specific  testbeds  being  used 
to  determine  general  programming  mechanisms  and  forms  for  their 
representation . 

There  are  several  other  areas  of  research  that  are  knowledge 
based  in  one  way  or  another.  One  of  these  is  the  study  of  automatic 
program  debugging.  Here  the  system  uses  its  explicit  knowledge 

of  general  programming  and  of  the  particular  problem  domain  to 
detect,  diagnose,  and  treat  logical  programming  errors.  While 
perhaps  somewhat  more  tractable,  this  problem  is  clearly  related 
to  program  synthesis;  in  fact,  it  has  been  suggested  that  program 
generation  can  be  viewed  as  the  debugging  of  an  initially  trivial 
program.  A related  idea  is  that  of  program  explaining  or  program 
documenting  systems,  which  can  relate  the  intent  of  a program  to 

its  textual  body  and  thus  assist  those  who  wish  to  understand  or 

v • . (47) 

change  it. 
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